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Abstract 

A parameterisation of generalised network clustering, in the form of 
four-motif prevalences, is presented. This involves three real parameters 
that are conditional on one- two- and three-motif prevalences. Interpreta- 
tions of these real parameters are presented that motivate a set of rewiring 
schemes to create appropriately clustered networks. Finally, the dynami- 
cal implications of higher order structure, as parameterised, for a contact 
process are considered. 



1 Introduction 

Networks have become one of the indispensible tools for the study of complex 
systems with many interacting components, as demonstrated by their ubiquity 
in the Proceedings of the recent European Conference on Complex Systems 
with which this journal issue is concerned. In particular, the combination of 
high clustering amongst nodes and short average path length, commonly known 
as the small world phenomenon [T3], has been observed not only in social net- 
works [11, but also in technological, metabolic and citation networks pi fTUlfT^ . 

SmaU connected sub-graphs of complex networks, known as motifs, have also 
been observed to have significantly different prevalences from those expected in a 
random case, leading to scientific insight [SJ [J • This paper is concerned with an 
alternative approach to motif prevalence that conditions on standard, triangle- 
level clustering, as a guide to intuition for other applications of the concept of 
motifs. In particular, new wirings are presented that modify clustering without 
changing node degree, along the lines of [H |6l IH [2] . 

Networks, and population structure in general, have also become central 
to modern infectious disease epidemiology [5]. The impact of motif structure 
for SIS epidemics was considered in [5] , and we combine the dynamical system 
developed in that work with the new parameterisation to gain insights into the 
impact of higher-order clustering on transmission / contact process dynamics. 

2 Characterisation of motif structure 



We start by considering the relatively simple structure of one- two- and three- 
motif prevalences. At orders one and two, there are only the number of nodes 
in a network and the number of links to consider. For simplicity, we consider 
networks with a single giant component of N nodes in which each individual 
has exactly n links connecting it to the rest of the network. This assumption 
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is not essential to the general thrust of analysis presented, but does simplify an 
already complex set of manipulations. In our notation, we use a diagramatic 
representation of a node and linked nodes enclosed in square brackets to denote 
prevalence of that motif in the network. This means that at order one and two: 

[•]^N, [-]^nN. (1) 

So the motif structure at this level is given equivalently either by the raw motif 
prevalences [ • ] , [—] , or by the real numbers N, n. The benefit of the latter 
approach is that n tells us something about the number of links per node — i.e. 
two-motif structure conditional on one-motif structure. 

Less trivially, there are two connected three-motifs: triangles and unclosed 
triples. Since every triple must be either closed or unclosed, the prevalences of 
three-motifs, notated using square brackets and diagrams as for other motifs, 
obey the identity 

[a] + [a] =Nn{n-l) . (2) 

This means that a real parameter (j) £ [0, 1] can be introduced to partition this 
identity as below: 

[a] =iVn(n-l)(l-0) , [a]= Nn{n-l)(j} . (3) 

In network analysis, (j) (the ratio of triangles to all triples, closed and unclosed) 
is often called the clustering coefficient. In the same way that n conditions 
on network size, cf) conditions on network size and number of links to measure 
transitivity of the network in a different manner from raw counts of triangles. 

We now attempt a similar parameterisation at order four. There are six 
connected graphs of size four, which can be represented pictorally using the 
following set of symbols: 

[k, n, K, □, s, m] . 
A set of identities analagous to ^ was introduced in [3], 

[k] +2[ei] + [m] ={n-2) [a] , 
[k]+2[k] + [s] =(n-2)[A] , (4) 
[n] + [S] + [□] + [S] =(n-l)[A] . 

Each of these identies is derived by starting with the three-motif appearing on 
the right-hand side of the identity (either a triangle or unclosed triple) and then 
joining a fourth node to one of the original three; the left-hand side of each 
identity can then be seen as an enumeration of the possible additional links 
between the new node and the two other nodes within the original three-motif. 
We now propose the main innovation of this work, a partition of these identities 
in terms of three real parameters, "01 d 

We start this process by writing down the four-motif prevalences that would 
be expected if transitive closure of any given triple is a random event of con- 
stant probability (p. In the case where no triangles at all are present in the 
network, the only four-motif clustering structure possible is the closure of four- 
lines into squares, and the appropriate motifs obey [n] -I- [□] = Nn{n — 1)^. 
This motivates the introduction of a square-level partition of these two motifs, 
^, analagous to </) in ([3]), but not equivalent in the case where some triangles 
are present in the network. 
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Finally, we introduce parameters C. and ^ additively to the prevalences of the 
motifs S and Kl respectively, and then use the identities ^ to carry through 
the consequences of this addition to other motif prevalences, yielding the form 
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Requiring that no motif prevalence be negative, the new parameters sit in the 
following ranges, provided each of the others is zero: 



^ e [0, 1] 



max ( — — (1 — 



(6) 



(-30(1 -0)2, -03), ^02(1 _ 



A neighbourhood-based interpretation of these new parameters for certain lim- 
iting cases is considered in Figure [TJ This figure shows a typical neighbourhood 
around an individual in a network with n — 6, and clustering parameter values 
varied. Plot (a) shows a completely unclustered graph — essentially a Cayley 
tree of degree 6. In (b), triangle-level clustering has been introduced, but in 
such a way that the triangles do not form highly connected fourth-order struc- 
tures, (c) shows how the 'envelope' shape S is more prevalent than would be 
expected on the basis of the three-motif structure: this means that ^ is posi- 
tive, while for under-represented envelopes C would be negative, (d) shows that 
the 'four-clique' M is, in the same way, over-represented, implying positive ^, 
while its under-representation would imply negative ^. Finally, (e) shows that 
ip represents the ratio of squares to all four-lines (closed and unclosed), and 
involves connections being made further away from the central node than other 
clustering parameters. In this plot, as with (b), the squares are shown maxi- 
mally uncorrelated — obviously, at still higher orders of clustering, correlations 
between squares may be parameterised as for triangles. 

Outside of these limiting cases, however, the interpretation of the new clus- 
tering parameters is more subtle, since the consistency conditions ^ are much 
more structured than In this sense, the parameterisation of four- motif 

structure is not a straightforward extension of the methodology used at the 
three-motif level. 
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3 Rewiring schemes 



Rewiring schemes that preserve the number of hnks attached to a node can play 
an important role in understanding, creating and manipulating networks. We 
now present a set of rewiring schemes that modify the clustering parameters 
we have introduced, two from existing work (together with applications) and 
three that are, to our knowledge, novel. These rewirings are an aid to intuition 
and also demonstrate that explicit networks of the kind considered here can 
be generated given sufficient computational resources. Nevertheless, their naive 
implementation is highly computationally intensive, and does not scale well with 
network size, meaning that technical innovation beyond the scope of this paper 
is necessary to produce simulations equivalent to the results obtained below 
using moment closure. 

3.1 Randomiser 

This rewiring was used recently in epidemiological applications [H [5] to remove 
all forms of clustering without changing degree distribution. 




3.2 'Big V 

This rewiring was considered recently in [TJ [SJ |3] to increase cj) without changing 
the degree distribution. 
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3.3 'Big U' 

This novel rewiring increases ip 
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3.4 'YV 

This novel rewiring increases C and </>. 





4 Contact-process dynamics 

We now present the model of [3] , used to investigate the impact of higher order- 
clustering on what epidemiologists call SIS dynamics. In these dynamics, often 
called a contact process, individuals are either susceptible (S) or infectious (/) 
with letters A, B , C . . . representing either of these states. Transmission of in- 
fection happens between infectious individuals / and susceptible individuals S 
linked on the network at a rate r, while infectious individuals recover and be- 
come susceptible, since recovery is assumed not to offer lasting immunity, at a 
rate g. We use square brackets to denote the prevalence of certain structures in 
the network. 

4.1 Exact dynamical equations 

The model in question takes as its starting point a set of differential equations 
that are, in the N ^ oo limit, exact but form an infite hierarchy. We present 
the first three orders of this: 
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-T[S-I]+g[I], 

T[S-I]-g[I], 

-2t [ S-S-I ] +2g[ S-I ] , 

r ([ S-S-I ] - [ I-S-I ] -[ S-I ])+ 5 ([/-/]- [ S-I ]) , 
2t {[ I-S-I ]+[ S-I ])-2g[I-I], 

- T (2 [ S-^-^S-I ] + [ S-S^S^I ])+g{2[ S-S-I ] + [ S-I-S ]) 

r ([ S-^-S-I ] - [ • • I ]-[S T ] - [ S-S-I ]) 

+g {[ S-I-I ] + [ I-S-I ] - [ S-S-I ]) , 

+^ ([ ^-^^J ] -2 [ ^rfS-I ] 2 [ S-I-S ]] 
+g{2[ S-I-I \-[ S-I-S \) 



S-S-I ■ ••/]-[ S-I-I- ■■!] 
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+ [ S-I-S ] + [ S-S-I ] - [ S-I-I ])+g{[ I-I-I ] -2 [ S-I-I ]) 

r (2 [ ^-^-J. J ] - [ I-§=T^I ] -2 [ I-S-I ]) 
+g{[ I-I-I ]-2[ I-S-I]) , 

T (2 [ ^ i'l'-'i ] + [ / SL--'I ] +2 [ S-/-7 ] +2 [ 7-5-7 ]) 

- 35 [ 7-7-7 ] , 

- 3r [ S-S-^S -I ] +3.9 [ 5-5'-7 ] , 

T ([ g^.gyg '-/ ] -2 [ 5' S 7-'-'-7 ] -2 [ 5-5-7 ]) 
+fl (2 [ 5-7-7 i - [ 5-5-7 ]) , 

r (2 [ S-§-L--'l ] - [ S-I-i-'-'-'l ] +2 [ 5-5-7 ] -2 [ 5-7-7 ]) 
+5 ([7-7-7] -2 [^7]) , 
3t ([ 5jj" ■ ■ 7 ] +2 [ 5-7-7 ]) - 3ff [ 7-7-7 ] . 
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Here and throughout this paper, we use dotted hues to imply expansion as 
below: 

[ A-B-C ] = [ A-B-C ] + [ A-B-C ] , 



[ A-B-C-D ] = [ A-B-C-D ] + [ A-B-C-D ] + [ A-B-C-D ] + [ A-B-C-D ] 



(8) 



and similarly for other fourth-order terms. 
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4.2 Closure schemes 



To integrate the system as presented so far, we need a closure scheme, previously 
introduced in [3], which is most easily expressed in terms of the raw motif 
prevalences. 
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Then ^ , together with this closure and equations ([7]) create an integrable ODE 
system. Provided r is sufficiently large compared to g, this system has a steady 
state with a non-trivial proportion /* of the network infectious. Standardly, this 
equilibrium value is called the endemic state. We investigate this dynamically 
in Figure [5J where is increased for other parameters held constant in all 
plots, giving the common black line. We then modify either (a) C, (b) ^ or (c) 
ip. This shows that, essentially, ip has a significant but relatively stable effect 
in reducing the endemic state at each (f) value, while ^ can have a significant 
effect in either direction at larger values. on the other hand, is relatively 
dynamically unimportant, except perhaps at moderate values of 0. We also note 
that positive values of C reduce the endemic state, and negative values increase 
it, while the opposite is true for ^. 



5 Discussion 

This paper has presented a novel way of thinking about higher-order structure 
in networks, together with intuitive explanations of this, rewiring schemes and 
dynamical consequences. This opens up three main questions. 

Firstly, what are reasonable parameter values for networks that are seen in 
nature, and which can be explicitly constructed? The exact values of clustering 
coefficients considered in the SIS model are perhaps slightly larger than are 
likely to be seen or constructed, although this should be mainly of quantitative 
importance since the qualitative dynamical implications found for higher order 
clustering are not modified at different coefficient values, and moment closure 



7 



(particularly in the three-motif case) has been extensively used in modelling 
SIS and SIR dynamics without producing qualitatively incorrect results [S]. 
Nevertheless, if sufficiently efficient methods were available to generate explicit 
networks to run stochastic simulations on, that would significantly increase the 
confidence in the results obtained here using moment closure. 

Secondly, how can this analysis be generalised to networks with heteroge- 
neous numbers of links, and (perhaps more problematically) preferential as- 
sortative connection between nodes of similar degree? While such analysis is 
doubtless possible, the large number of interacting quantities may make an- 
alytic results technically difficult. In particular, it it unlikely that arbitrary 
heterogeneity and clustering statistics are compatible with each other. 

Finally, under what conditions is it necessary to consider /c-motifs for a 
given kl Clearly, a high preponderance of triangles in a network would favour a 
pairwise model, but this answer is less clearly posed for four-motifs in general. 
However, the parameterisation suggested here goes some way towards answering 
this question: starting with a set of four-motif prevalences, are these significantly 
different from what would be predicted based on the values of three- two- and 
one-motif parameters n and TV? If the new generalised clustering parameters 
(f), C and ^ are significantly different from zero, then we would expect that at 
least four-motifs should be considered in analysis of the network. 
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(a) 0,^ = 



(b) ^ > O,'^ = 0, minimal C,C 




(e) (/) = 0, i/i > 



Figure 1: Interpretation of the clustering parameters <p,ip,(^,^ for a typical 
neighbourhood in a network with n = 6 
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(a) Vary ( 



(b) Vary ( 




(c) Vary tp 

0.64 1 ' ' r 




Triangle clustering coefficient, ^ 



Figure 2: Dynamical results for the endemic state of the triplewise contact 
process model for n = 6, = 1, r = 3/5 and other parameters varied. 
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